Cross-Lingual Classification of Topics in Political Texts
نویسندگان
چکیده
In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.
منابع مشابه
Unsupervised Cross-Lingual Scaling of Political Texts
Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence that the output of these models correl...
متن کاملProbabilistic topic modeling in multilingual settings: An overview of its methodology and applications
Probabilistic topic models are unsupervised generative models which model document content as a two-step generation process, that is, documents are observed as mixtures of latent concepts or topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingua...
متن کاملCoarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge
We present a method for coarse-grained cross-lingual alignment of comparable texts: segments consisting of contiguous paragraphs that discuss the same theme (e.g. history, economy) are aligned based on induced multilingual topics. The method combines three ideas: a two level LDA model that filters out words that do not convey themes, an HMM that models the ordering of themes in the collection o...
متن کاملRobust Cross-Lingual Genre Classification through Comparable Corpora
Classification of texts by genre can benefit applications in Natural Language Processing and Information Retrieval. However, a mono-lingual approach requires large amounts of labeled texts in the target language. Work reported here shows that the benefits of genre classification can be extended to other languages through cross-lingual methods. Comparable corpora – here taken to be collections o...
متن کاملN-Gram Language Modeling for Robust Multi-Lingual Document Classification
Statistical n-gram language modeling is used in many domains like speech recognition, language identification, machine translation, character recognition and topic classification. Most language modeling approaches work on n-grams of terms. This paper reports about ongoing research in the MEMPHIS project which employs models based on character-level n-grams instead of term n-grams. The models ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017